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Abstract 

In the last couple of years many works have been devoted to Abelian complexity of words. Recently, 
Constantinescu and Hie (Bulletin EATCS 89, 167-170, 2006) introduced the notion of Abelian 
period. We show that a word w of length n over an alphabet of size a can have 0(n 2 ) distinct 
Abelian periods. However, to the best of our knowledge, no efficient algorithm is known for 
computing these periods. The Brute-Force algorithm computes all the Abelian periods either in 
time 0(n 3 x a) using O(a) space or in time 0(n 2 x a) using 0(nx a) space. We present an off-line 
algorithm running in time 0(n 2 x a) using 0{n + a) space, thus improving the space complexity. 
This algorithm is based on a select function. We then present on-line algorithms that also enable 
to compute all the Abelian periods of all the prefixes of w. Experimental results show that the 
new off-line algorithm is faster than the Brute- Force one. Moreover, in most cases, one on-line 
algorithm, though having a worst case time complexity, is also faster than the Brute-Force one. 

Keywords: Abelian period, Abelian repetition, weak repetition, design of algorithms, text 
algorithms, combinatorics on words 



1. Introduction 

An integer p > is a (classical) period of a word w of length n if w[i] = w[i + p] for any 
1 ^ i ^ n — p. Classical periods have been extensively studied in combinatorics on words p2| due 
to their direct applications in data compression and pattern matching. 

The Parikh vector of a word w enumerates the cardinality of each letter of the alphabet in w. 
For example, given the alphabet E = {a, b, c}, the Parikh vector of the word w = aaba is (3, 1, 0). 
The reader can refer to [6 J for a list of applications of Parikh vectors. 

An integer p is an Abelian period of a word w over a finite alphabet S = {ai, ag, • • • , a a } if w 
can be written asi« = uqU\ ■ ■ ■ u^-iUk where for < i < k all the itj's have the same Parikh vector 
V such that Ya=i ^fal = P an d th e Parikh vectors of no and are contained in V [9]. For example, 
the word w = ababbbabb can be written as w = UQU1U2U3, with no = a, u\ = bab, U2 = bba and 
us = bb, and 3 is an Abelian period of w. 
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This definition of Abelian period matches the one of weak repetition (also called Abelian power) 
when no and Uk are the empty word and k > 2 [10j . 

In the last couple of years many works have been devoted to Abelian complexity [TJ [21 HI El 
HH . Efficient algorithms for Abelian pattern matching have been designed [HI El El IS] • 

However, apart of the greedy off-line algorithm given in [10], neither efficient nor on-line algorithms 
are known for computing all the Abelian periods of a given word. 

In this article we present several efficient off-line and on-line algorithms for computing all the 
Abelian periods of a given word. In Section [2] we give some basic definitions and fix the notation. 
Section [3] presents off-line algorithms while Section [4] presents on-line algorithms. In Section [5] we 
give some experimental results on execution times. Finally, Section [6] contains conclusions and 
perspectives. 

2. Definitions and notation 

Let £ = {ai, 02, ... , a a } be a finite ordered alphabet of cardinality a and S* the set of words 
over S. We set ind(ai) = i for 1 ^ % ^ a. We denote by \w\ the length of w. We write w[i] 
the i-th symbol of w and w[i. .j] the factor of w from the i-th symbol to the j-th symbol, with 
1 ^ i ^ j ^ \w\. We denote by |u>| a the number of occurrences of the symbol a £ S in the word w. 

The Parikh vector of a word w, denoted by V w , counts the occurrences of each letter of X in 
w, that is V w = (\w\ ail . . . , |u>|a CT )- Notice that two words have the same Parikh vector if and only 
if one word is a permutation of the other. 

We denote by V w {i^rn) the Parikh vector of the factor of length m beginning at position i in 
the word w. 

Given the Parikh vector V w of a word w, we denote by V w [i] its i-th component and by 
\V W \ the sum of its components. Thus for w G S* and 1 ^ i ^ a, we have V w [i] = \w\ ai and 

I'Pwl = XXiKubl = \w\- 

Finally, given two Parikh vectors V,Q, we write V C Q if V\i\ ^ Q[i] for every 1 i ^ a and 

1^1 < id- 

Definition 1 (|9j). A word w has an Abelian period (h,p) if w = uqU\ ■ ■ ■ u^-iUk such that: 

• T^uo C V Ul = ■ ■ ■ = V Uk _ 1 13 ~Pu k , 

* l^uol — h> r Mi I — V- 

We call uo and Uk resp. the head and the tail of the Abelian period. Notice that the length 
t = \uk\ of the tail is uniquely determined by h, p and \w\, namely t = (\w\ — h) mod p. 

The following lemma gives a bound on the maximum number of Abelian periods of a word. 

Lemma 2.1. The maximum number of Abelian periods for a word of length n over the alphabet £ 
is 6(n 2 ). 

Proof. The word (a\a2 ■ ■ ■ a (T ) n l' T has Abelian period (h,p) for any p = mod a and every h such 
that ^ h ^ min(p — 1, n — p). □ 

A natural order can be defined on the Abelian periods. 

Definition 2. Two distinct Abelian periods (h,p) and (h',p') of a word w are ordered as follows: 
(h,p) < {hi ,p') if p < p' or {p = p' and h < h'). 
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We are interested in computing all the Abelian periods of a word. The algorithms we present 
in this paper can be easily adapted to give only the smallest Abelian period. 

3. Off-line algorithms 

3.1. Brute-Force algorithm 

In Figure [T] we present a Brute-Force algorithm which computes all the Abelian periods of an 
input word w of length n. For each possible head of length h from 1 to [(n — l)/2j the algorithm 
tests all the possible values of p such that p > h and h + p ^ n. This is a reformulation of the 
algorithm given in [10]. The algorithm easily adapts to give only the smallest Abelian period or 
the weak repetitions. 

AbelianPeriod-BruteForce(u>, n) 

1 for h <r- to [(n - l)/2j do 

2 p<-h + l 

3 while h + p < n do 

4 if (h,p) is an Abelian period of w then 

5 Output(/i,p) 

6 p <— p + 1 



Figure 1: Brute- Force algorithm for computing all the Abelian periods of a word w of length n. 



Example 1. For w = abaababa the algorithm outputs (1,2), (0,3), (2,3), (1,4), (2,4), (3,4), 
(0,5), (1,5), (2,5), (3,5), (0,6), (1,6), (2,6), (0,7), (1,7) and (0,8). Among these periods (1,2) 
is the smallest. 

Theorem 3.1. The algorithm AbelianPeriod-BruteForce computes all the Abelian periods 
of a given word of length n in time 0(n 3 x a) with 0(a) space or in time 0(n 2 x a) with 0(n x a) 
space. 

Proof. The correctness of the algorithm comes directly from Definition[T} Each test in line|4]consists 
in comparing n/p Parikh vectors. Comparing two Parikh vectors can be done in 0(c) time. The 

test in line 4 is performed X^i=o Sp=h+i n /P = 0(Zlh=l Ylp=h n /p) = 0{n 2 ) times. With no 
preprocessing, this gives an overall time of 0(n 3 x a). If the Parikh vectors of all the prefixes of 
the word have been already computed, this can be done by computing the difference between two 
Parikh vectors (see [3]). This requires space and time in 0{n x a) and gives an overall time of 
0(n 2 x a). □ 

3.2. Select-based algorithm 

Let us introduce the select function |17j defined as follows. 

Definition 3. Let w be a word of length n over alphabet E, then Vo € X: 

• select a (w, 0) = 0; 

• V 1 ^ i ^ \w\ a , select a (w, i) = j iff j is the position of the i-th occurrence of letter a in w; 



• Vi > \w\ a , select a (w,i) is undefined. 

In order to compute the select function, we consider an array S w of \w\ elements that stores 
the increasing ordered positions of a±, then the increasing ordered positions of 02 and so on up to 
the increasing ordered positions of a a . In addition to S w , we also consider an array C w of a + 1 
elements defined by: C w [l] = 1, C w [i] = £)}=i IHaj + 1 for 1 < i ^ a and C w [a + 1] = \w\ + 1. 
Actually, C w [i] — 1 is the number of letters in w strictly smaller than aj. Array C w serves as an 
index to access S w . Hence, for i > 0, we have: 



select a (w, i) 



S w [C w [ind(a)] + i — 1], if i ^ C w [ind(a) + 1] — C w [ind(a)} 
undefined otherwise. 



Example 2. For w = abaababa, the select function uses the following three arrays: 
12345678 ab 12 



w 



ind 



1 2 



Si, 



1 


2 


3 


4 


5 


6 


7 


8 


h- 1 


3 


4 


6 


8 


2 


5 


7 



1 6 9 



Then, for instance, selectb(w,2) = S w [C w [ind(b)] + 2 — 1] = 5, meaning that the second b in w 
appears in position 5. 

Algorithm ComputeSelect (see Figure [2]) computes the two arrays C w and S w used by the 
select function. 

Proposition 3.2. Algorithm ComputeSelect runs in 0(n + a) time and space. 



Proof. The time complexity comes from the fact that the loops in lines [2j|3] and [4]-[5] are executed 
0(a) times, the loop in lines [6]-[8] is executed n times and all the other instructions take constant 
time. □ 

Once the arrays C w and S w have been computed, each call to the select function is answered 
in constant time. 

The Brute- Force algorithm tests all possible pairs (h,p) but it is clear that, for a given h, some 
pairs cannot be Abelian periods. For example, let w = abaaaaabaa and h = 2. Since V w (l, h) has 
to be included in V w (h + l,p), the pairs (2,3), (2,4) and (2,5) cannot be Abelian periods of w: 
the minimal p value such that (2,p) can be an Abelian period is in fact 6, in order to include the 
second b of w. This remark leads us to give the following definitions and propositions. 

Definition 4. Let w be a word of length n on alphabet S. Then V0 ^ h ^ [(n — l)/2j ; A^«,[/i] is 
defined by 



M w [h] 



mm{p\ V w (l,h) cT w (h + l,p)} i/VaG S, 2 x \w[l..h]\ a < |to| 
— 1 otherwise. 



4 



ComputeSelect(w, n) 

1 C w [l]<-1 

2 for i <— 2 to a + 1 do 

3 C w [i]^ C w [i-l]+V w [i-l] 

4 for i <— 1 to a do 

5 P[i] <- 

6 for i <- 1 to n do 

7 5 w [C TO [md(u;[i])] + P[md(w[i])] <- i 

8 P[md(u>[z])] «- P[ind(w[i])} + 1 

9 return (C w , S w ) 



Figure 2: Algorithm computing C w and arrays. 



In other words, if Va € E, select a (w, 2 x |u)[l. . /i]| a ) is defined then 

Attu[/t] = max{/i + 1, m&x{select a (w, 2 x |io[l. . h]\ a ) \ a G £} — /i}, 
otherwise .A/f^f/i] = —1. 

The algorithm COMPUTEM(w,n, C W ,S W ) (see Figure [3]) processes positions of w from left to 
right. 

Proposition 3.3. Algorithm ComputeM(io, n, C w , S w ) computes the array Ai w in time and space 
0(n + a). 

Proof. The correctness of the algorithm comes directly from Definition [4j The time complexity 
comes from the fact that the loop in lines [2j^3] is executed a times, the loops in lines [I]-[9] and Tof 12 



are executed 0(n) times and all the other instructions take constant time. □ 

Proposition 3.4. Let w be a word of length n on alphabet £ and ^ h ^ [(n — l)/2j. // 
•M to [7t] = —1, then M w [h'] = — 1 V h' ^ h and h! cannot be a head of an Abelian period of w. 

Proof. If A^ to [/t] = —1, then by definition 3 a £ S such that 2 x \w[l. . h]\ a > \w\ a . Then, one 
cannot find a value p such that \w[l. . h]\ a ^ \w[(h + 1). . (h + p)]\ a - It is clear that this is also true 
for any value h! > h. □ 

Consider now the following definition. 

Definition 5. Let w be a word of length n on alphabet S. Then VO ^ h ^ \_(n — l)/2j, is 
defined by 

Qw[h] = max{select a (w,i + 1) — select a (w,i) \ a £ X, h < select a (w,i) < select a (w,i + 1) ^ n}. 

Actually, ^[/i] is the maximal value j — i such that h < i < j and w[i] = w[j}. The array 
Q w can be computed by the algorithm ComputeG(w, n) (see Figure [4]) processing positions of w 
from right to left. 

Proposition 3.5. Algorithm ComputeG(w, n) computes the array Q w in time and space O{n+o~). 
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ComputeM(u>, n, C w , S w ) 



1 


M w [0] <-0 


2 


for a € £ do 


3 


F[a] «- 


4 


for /i <- 1 to L^J do 


5 


fl>[/i]] <- + 1 


6 




7 


if s is defined then 


8 


■M^f/i] «- max{A^^[/i - 1] 


9 


else A^tuf/i] < 1 


10 


for h <- 1 to L^J do 


11 


if A^u>[^] = ^ then 


12 


M w [h] <r- h + 1 


13 


return A^u, 



Figure 3: Algorithm computing the M w array. 



Proof. The correctness of the algorithm comes directly from Definition [5} The time complexity 
comes from the fact that the loop in lines [2}{3] is executed a times, the loop in lines [4- 10 is executed 
n times and all the other instructions take constant time. □ 

Proposition 3.6. Let w be a word of length n on alphabet S. Let ^ h ^ \{n — l)/2j. If 
h < p < m&x{A4 w [h], [(G w [h] + l)/2j} then (h,p) is not an Abelian period of w. 

Proof. From the definition of M w [h], it directly follows that if p < M w [h], then (h,p) cannot be 
an Abelian period of w. 

Given h, let a S £ be such that there exists 1 ^ i < n and select a (w, i+l) — select a (w, i) = Q w [h\. 
Let j = select a (w,i) and j' = select a (w,i + 1). If p < [(G w [h] + 1 ) / 2 J , k = mm{k' \ h + k'p ^ j} 
then h + (k + l)p < j' and \w[k + kp + 1. . h + (k + l)p]| a = 0. Thus (h,p) cannot be an Abelian 
period of w (see Figure [5]). □ 

Arrays M w and Q w give, for every head length h, a minimal value for a possible p such that 
(h,p) can be an Abelian period of w. This allows to skip a number of values for p that cannot give 
an Abelian period. 

The following lemma shows how to check if (h,p) is indeed an Abelian period of w (except for 
the tail). 

Lemma 3.7. Let w be a word of length n on alphabet S. Let % = V w (l, h) and V = V w (h + l,p) . 
Let i = h + kp such that < k, p ^ n — i and (h,p) is an Abelian period of w[l. . i] (with an empty 
tail). Then the following two points are equivalent: 



1. (h,p) is an Abelian period of w[l .. i + p] . 

2. for all a £ £ 

select a (w,'H[ind(a)] + ( 1 + 



x V[ind{a)]) s^i+p. 
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ComputeG(u;, n) 

1 g w [n] i- 

2 for a £ E do 

3 T[o] <- 

4 for /i <- n to 1 do 

5 if T[w[h}} = then 

6 T[w[h]] <- h 

7 Gw[h-1]<-Gv,[h] 

8 else d <- T[w[h]} - h 

9 T[w[h}} <- /i 

10 £/™ [/i — 1] «— max{5io [/i] , 

11 return Q w 



Figure 4: Algorithm computing the Q w array. 



> 2p 





3 




f 




a 


no a 


a 




h 




h + kp h+(k + l)p 





no a 



< > < x > 

p p p 

Figure 5: If the distance between two consecutive o's in w is greater than 2p then (h,p) cannot be an Abelian period 
of w for any h < p. 



Proof. Since (h, p) is an Abelian period of w[l. .i] with i = h + kp for some k > then \ w[l. .i]\ a = 
~H[ind(a)] + k x "P[mc?(a)] for each letter a G E. Notice that since h < p then = L^/pJ- 
(1 => 2). The fact that (h,p) is an Abelian period of w[l..i + p] implies that, for all a € E, 
■ i +p]U = %[md(a)] + (A; + 1) x 'P[mc?(a)]. Thus, by definition of select, select a (w,H[ind(a)] + 
(1 + \ i/p\) x £>[mc£(a)]) < i + p. 

(2 =>• 1). The fact that select a (w,H[ind(a)] + (1 + [i/p\ ) x V[ind(a)]) ^ i +p implies that \w[l. . i + 
p]| a = U[ind(a)} + (fc + 1) x V[ind(a)]. We know that |io[l..i]| a = H[ind(a)] + k x P[md(o)]. By 
difference, + 1. .i + p]| a = V[ind(a)]. Since it is true for all a £ E, + l,p) = P and then 
(fo,p) is an Abelian period of w[l. .% + p\. □ 

Figure [6] presents the algorithm AbelianPeriod-Shift based on the previous lemma. 

Proposition 3.8. Algorithm AbelianPeriod-Shtft(/i,p, w, re, C W ,S W ) returns true iff(h,p) is 
an Abelian period of the prefix of length n — ((re — h) mod p) of w in time 0(^ x a) and space 0(a). 



Proof. The correctness comes directly from Lemma 3.7 The while loop in line pi is executed n/p 



times and the for loop in line His executed a times, thus the time complexity is 0(~ x a). This 
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AbelianPeriod-Shift(/i,p, w, n, C w , S w ) 

1 (n,V) <- (V w (l,h),V w (h + l,p)) 

2 i <(— h + p 

3 while i + p ^ n do 

4 for oeEdo 

5 s «- se/ecf a (u;,'H[md(a)] + (1 + [z/joj ) x 7>[m<f(a)]) 

6 if s is undefined or s > i + p then 

7 return FALSE 

8 i <— i + p 

9 return TRUE 



Figure 6: Algorithm checking whether (h,p) is an Abelian period of the prefix of length n — ((n — h) mod p) of w. 



algorithm only requires the storage of the two Parikh vectors V w (l,h) and V w (h + l,p). These 
vectors can be stored in space 0(a) under the standard assumption that logn fits in a computer 
word. □ 



Using Proposition 3.6 and Proposition 3.8 algorithm AbelianPeriod-Select, given in Fig- 
ure [7j computes all the Abelian periods of a word w of length n. 

AbelianPeriod-Select(w, n) 

1 (C w , S w ) <r- COMPUTESELECT('W,n) 

2 M w ComputeM(w, n, C w , S w ) 

3 5«j <- COMPUTEG(w,n) 

4 hi- 

5 while h < [(n - l)/2j and M w [h] / -1 do 

6 p^max(AC[/iU(^[/>] + l)/2j) 

7 while h + p ^ n do 

8 if AbelianPeriod-Shift(/z,p, w, n, C w , S w ) then 

9 t <— (n— h) mod p 

10 if P TO (n-t + CP w (h + l,p) then 

11 Output(/i,p) 

12 p <— p + 1 

13 7i <- h + 1 



Figure 7: Algorithm computing all the Abelian periods of word w of length n, based on the select function. 



Theorem 3.9. Algorithm AbelianPeriod-Select computes all the Abelian periods of word w 
of length n in time 0(n 2 x a) and space 0(n + a). 



Proof. The correctness of the algorithm comes from Proposition |3.6| and Proposition 3.8 

The select function and the arrays Ai w and Q w can be computed in 0(n + a) time and space. 
According to Proposition 3.6 the value of p computed in line [6] is the minimal value such that 
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(h,p) can be an Abelian period of w. The AbelianPeriod-Shift function, called in line [8j 
simply verifies that (h,p) is an Abelian period of w in time x a). The test in line 10 is done 
in 0(p) time. The complexity of the while loop in line 7jis 0(^T n p=h+l ^) = 0(n). Consequently, 
algorithm AbelianPeriod-Select computes all the Abelian periods of w in time 0(n 2 x a) and 
space 0(n + a) (output periods are not stored). □ 



4. On-line algorithms 



We now propose three on-line algorithms to compute all the Abelian periods of a word w using 
dynamic programming. When processing w[i], in the first algorithm, using a two dimensional array, 
we inspect all the possible values (h,p); in the second algorithm we use lists, while in the third 
one, using heaps, we inspect the Abelian periods of w[l. . i — 1] by groups built depending upon 
the tail length of the periods. 

The following proposition states that if (h,p) is not an Abelian period of a word w, with 
h + p ^ n = \w\, then it cannot be an Abelian period of any word having w as prefix. 

Proposition 4.1. Let w be a word of length n and let h,p such that h + p ^ n. If (h,p) is not an 
Abelian period of w, then (h,p) is not an Abelian period of wa for any letter a € E. 

Proof. If (h,p) is not an Abelian period of w, at least one of the following three cases holds: 

1. V w (l,h) <£V w (h + l,p); 

2. there exist two distinct indices h ^ ^ \w\ — p+1 such that i = kp+h-\-\ and i! = k'p+h+1 
with k and k 1 two integers and V w (i,p) ^ V w {i! ,p); 

3. t = (\w\ — h) mod p and T^dtol — t + l,t) qt T w (\w\ — p — t + l,p). 

If case 1 holds then V W aO-,h) *t- V wa {h + l,p) and (h,p) is not an Abelian period of wa. If case 
2 holds then V wa {i,p) ^ V W a(i',p) and (h,p) is not an Abelian period of wa. If case 3 holds then 
V wa {\w\ — t + 1, t + 1) ^ V wa {\w\ — p — t + l,p) and (h,p) is not an Abelian period of wa. □ 

4-1- Two-dimensional array 

We now propose an algorithm (given in Figure ^) that uses a two dimensional array and 



Proposition 4.1 to compute all the Abelian periods of an input word w in an on-line manner. It 
processes the positions of w in increasing order. When processing position i, T[h,p] = j iff w[l. . j] 
is the longest prefix of w[l. . i] having Abelian period (h,p). Thus if j = i — 1 the algorithm checks 
whether w[l. .i] has Abelian period (h,p) and updates T[h,p] accordingly. 

When T[h,p] = i it means that w[l. . i] is the longest prefix of w that has (h,p) as an Abelian 
period. Thus when T[h,p] = n it means that (h,p) is an Abelian period of w. 

Example 3. For w = abaababa, the algorithm computes the following array T: 



h\p 


12345678 





13868888 


1 


8 6 8 8 8 8 


2 


8 8 8 8 


3 


8 8 



9 



AbelianPeriod-array(w, n) 

1 T[0, 1] <- 1 

2 for i <- 2 to n do 



3 for p <— 1 to i — 1 do 

4 for fa •<— to min{p — 1, i — p — 1} do 

5 if T[h,p] = i- 1 then 

6 d ^ (i — h) mod ^ 

7 if d ^ then 

8 if 7V* - d + l,d) C - then 

9 T[h,p]4-i 

10 else if 7\,(« - p + l,p) = V w (i -2xp + l,p) then 

11 T[h,p]4-i 

12 for h <- to [«/2j - 1 do 

13 if V w (l,h) C T> w {h + l,i - fa) then 

14 T[fa, i - fa] <- i 

15 else T[fa,i - fa] ^ 1 



16 return T 



Figure 8: On-line dynamic programming algorithm for computing all the Abelian periods of a word w of length n 
using an array. 



Cells T[h,p] = \w\ correspond to pairs (fa,p) output by algorithm AbelianPeriod-BruteForce 
of example^ Empty cells on the left part of the array correspond to cases where fa > p and empty 
cells on the right part correspond to cases where fa + p > \w\. 

Theorem 4.2. The algorithm AbelianPeriod- array computes all the Abelian periods of a given 
word of length n in time @(n 3 x a) and space 0(n 2 ). 



Proof. The correctness of the algorithm comes from Proposition 4.1 The time complexity of the 
algorithm is due to the three for loops of lines [2] to |4j The space complexity is due to the array 
T. □ 

4.2. Lists 

The algorithm given in Figure [9] also processes the position of w in increasing order. When 
processing position i it only stores pairs (fa, t) such that w[l. . i — 1] has Abelian period I with head 
w[l..h]. 

Theorem 4.3. The algorithm AbelianPeriod-list computes all the Abelian periods of a given 
word of length n in time 0(n 3 x a) and space 0(n 2 ). 



Proof. The correctness of the algorithm comes from proposition 4.1 The space complexity for the 



list L is given by Lemma 2.1 The time complexity of the algorithm is due to the two for loops of 



lines [2] and 0] and the maximal number of elements in the list L. □ 
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ABELIANPERIOD-LIST(ti;, n) 

1 L^{(0,1)} 

2 for i <- 2 to n do 

3 V <- 

4 for all (h, I) € L do 

5 d<— (i — h) mod £ 

6 if d ^ then 

7 if V w (i - d + 1, d) C ^(t - d - ^ + 1,£) then 

8 L'^L'(J{(M)} 

9 else if -£ + l,f) = V w (i -2 x £ + !,£) then 

10 L' «— L' \J{(h, £)} 

11 L'<-L'(J{(0,*)} 

12 for /i <- 1 to [i/ 2 J - 1 do 

13 if Cp„Ci+l,i-li) then 

14 L'^L'UUM-Zi)} 

15 L <- V 



Figure 9: On-line dynamic programming algorithm for computing all the Abelian periods of a word to of length n 
using lists. 



4-3. Heaps 

The following proposition shows that the set of Abelian periods of a prefix of a word can be 
partitioned into subsets depending of the length of the tail. In some cases all the periods of a 
subset can be processed at once by inspecting only the smallest period of the subset. 

Proposition 4.4. Let w have s Abelian periods (h±,pi) < (^2^2) < ••• < (h s ,p s ) such that 
(\w\ — hi) mod pi = t > for 1 ^ i ^ s. For any letter a £ E, if (hi,p\) is an Abelian period of 
wa then (/t2>P2)> • • • , (h s ,p s ) are also Abelian periods of wa. 

Proof. Since (h,\,px) < (/i2,P2) < • • • < (h s ,p s ) are Abelian periods of w, w = 1^,0^,1 ■ • • Uifc-iUifa 

Pi and \ui t ki\ = t for 1 ^ i ^ s and 1 ^ j ^ k{. If is an Abelian 

we have that 



u 



1,3 



with I = hi 

period of wa, V UlM a C 'Pu 1>kl _ 1 - Since \u 1>kl \ = \u ijk .\ and K^-il < \u i>k .. 
V Ui k . a Q Vm for 2 i ^ s. Thus (h,2,P2), ■ ■ ■ > (h s ,p s ) are Abelian periods of (see Figure 10). 



n l,fcl-l 


Z 


a 




n 2,fc 2 -l 


Z 


a 




u s,k s -l 


z 


a 



Figure 10: w = Ui,oUi,i ■ ■ • tti,k i -ittt,fc 4 , "i.fcj = 2 for 1 < i < s. If Pza C then V za C V Ui k ._ 1 for every 

2 < i < s. 
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The algorithm given in Figure [TT] uses Proposition 4.4 for computing all the Abelian periods 
by gathering all the ongoing periods (h,p) with the same tail length together in a heap where the 
element at the root of the heap is the smallest period. 

When processing w[i], the algorithm processes every heap H for the different tail lengths: 



• if the period (h,p) at the root of H is a period of w[l. .i] then by Proposition 4.4 all the 
elements of H are Abelian periods ofw[l..i]. If the tail length becomes equal to p then (h,p) 
is removed from the current heap and is moved into a new heap corresponding to the empty 
tail. 

• if the period (h,p) at the root of H is not a period of w[l. .i\ then it is removed from H and 
the same process is applied until a pair (h',p') is an Abelian period of w[l. .i] or the heap 
becomes empty. This is realized by function ExtractUntilOK in line [8} 

Then, all the degenerate cases (h,p) such that h < p and h + p = i have to be inserted in the heap 



corresponding to the empty tail (lines 12 to 15). 



The function RoOT(if) returns the smallest element of the heap H , the function Insert (i?, e) 
inserts element e in the heap H, while the function Remove (H) removes the smallest element of 
the heap H. 

AbelianPeriod-heap(u;, n) 

1 L <— list with one heap containing (0, 1) 

2 for i 2 to n do 

3 NewHeap <- 

4 for all H G L do 

5 (h,p) <- RooT(iT) 

6 t <— p — ((i — h) mod p) 

7 if V w {i-t + l,t) <£V w (i-t-p+l,p) then 

8 ExtractUntilOK (H) 

9 else if t = p then 

10 Remove(^) 

11 lNSERT(NewHeap, (h,p)) 

12 h <r- 

13 while h < l)/2j and V w (l,h) C V w (h+ l,i-h) do 

14 lNSERT(NewHeap, (h, i — h)) 

15 h-k-h + l 

16 L<-LU NewHeap 

17 return L 



Figure 11: On-line algorithm for computing all the Abelian periods of a word w of length n using heaps. 



Theorem 4.5. The algorithm AbelianPeriod-heap computes all the Abelian periods of a given 
word of length n in time 0(n 2 x (nlogn) x a) and space 0(n 2 ). 



Proof. The correctness of the algorithm comes from Proposition |4.4| The maximum number of 
heaps is n/2 and the total number of elements of all the heaps is 0(n 2 ) (Lemma 2.1). The space 
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complexity for the list L is 0(n 2 ). The time complexity of the algorithm is due to the two for loops 
of lines [2] and 0] and the different calls to ExtractUntilOK in line [Hand Insert and Remove. 
The maximum number of heaps is n/2, and the maximum number of elements in a single heap 
is n. Thus, the total complexity for the calls to ExtractUntilOK, Insert and Remove in a 
single run of the for loop of line His O(nlogn). □ 



5. Experimental results 



To compare practical performances of the different algorithms, they have been implemented 
in C in a homogeneous way and run on test sets of random words (1000 words each) of different 
lengths (from 10 to 2000) on different alphabet sizes (2, 3, 4, 8 and 16). 

Tests were performed on a computer running Mac OS X with a 2.2 GHz processor and 2 GB 
RAM. 



Figure 12 a presents average running times over 1000 random words on alphabet size 16 of 
the algorithms AbelianPeriod-BruteForce, AbelianPeriod- Select and AbelianPeriod- 
Heaps. The results show that, as expected, the off-line algorithm using select function is indeed 
faster than the other ones. Moreover, our tests show that, for long words, the on-line algorithm 
using heaps becomes faster than the Brute- Force one. One can notice that the difference of running 
times between the three algorithms increases as the word length grows. Results for other alphabet 
sizes, natural languages texts or genomic sequences are not shown since they are similar to these 
ones. 




200 400 



800 2000 



1000 1200 
word length 



(a) 



(b) 



Figure 12: (a) Average running times (in ms) over 1000 random words, of the Brute-Force, select-hased and heaps- 
based algorithms on alphabet size 16. (b) Average running times (in ms) over 1000 random words, of the Brute-Force 
and select-based algorithms on alphabet size 16, in the case where h + 2p < \w\. 



6. Conclusion and perspectives 

In this paper we presented different algorithms to compute all the Abelian periods of a word. 
This is the first attempt to give algorithms for computing all the Abelian periods of a word. In 
particular, we give a 0(n 2 x a) time off-line algorithm requiring 0(n + a) space, thus reducing the 
space complexity compared to the Brute-Force algorithm. Moreover, in practice, this algorithm 
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appears to be faster. It is even faster when one wants to compute Abelian periods (h,p) of a word 
w with at least two repetitions of the period, i.e., with h + 2p ^ \w\ (see Figure [T^b). 



As shown in Lemma 2.1 the number of Abelian periods of a word can be quadratic. Never- 
theless, some periods exist just as a consequence of the existence of smaller ones. For instance, in 
the word w = abaababa of Example [TJ the fact that (1,4), (1,6), (3,4) are Abelian periods for w 
is just a consequence of the fact that (1, 2) is. So, let us define the cutting positions of an Abelian 
period (h,p) as follows: 

Cutw(h,p) = {k = h + jp | 1 ^ k ^ \w\ and ^ j}. 

We say that an Abelian period (h,p) of w is non-deducible if there does not exist another Abelian 
period (h',p') of u> such that Cut w (h,p) C Cut w (h! ,p'). Anyway, even the number of non-deducible 
Abelian periods can be quadratic (consider for instance the word a n ba n ba n ). 

It also remains to obtain a bound on the minimal Abelian period given a word length and an 
alphabet size. Simple modifications of the presented algorithms would allow one to compute the 
minimal Abelian period of each factor of a word. 

It seems quite clear that balanced words (words such that for any letter a £ E the difference 
of the number of a's in any two factors of the same length is bounded by 1) are the words with 
the maximum number of Abelian periods. In particular, binary balanced words, i.e., factors of 
Sturmian words, deserve to be further investigated. Our experiments show that, for example, the 
prefixes of the Fibonacci word have as Abelian periods only Fibonacci numbers, with four possible 
heads. We think that the Abelian periods of binary balanced words can be precisely characterized 
by means of arithmetic relations. 
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